Comments for MEDB 5501, Week 7

Two-sample t-test (Independent-samples t-test)

  • Randomized trial
    • Convenience sample
    • Random assignment to treatment, control
    • Measure continuous outcome
  • Cohort design
    • Observe exposed and control subjects
    • No random assignment
    • Measure continuous outcome

Assumptions

  • Group 1, 2 both normally distributed
    • Assessed with histograms, boxplots, Q-Q plots
    • Or rely on Central Limit Theorem
  • Possibly different means, but same variance
    • Assessed with boxplot, descriptive statistics
    • Also Levene’s test (not recommended)
  • Observations are independent
    • Between groups
    • Within groups
    • Assessed qualitatively

Normality

  • Assess each group separately, or
  • Combine after subtracting means
  • Less concern with normality for large sample sizes

Equal variances (homescedascity)

  • Compare the box part of the box plots
    • Look for large disparities only (2 or 3 fold)
  • Calculate and compare the standard deviations
    • Again, large disparities only
  • Levene’s test (not recommended)
    • Too little power for small sample sizes
    • Too much power for large sample sizes
    • Very sensitive to normality assumption

Independence

  • Assessed qualitatively
  • Independence between groups
    • No matching
    • No longitudinal measures
  • Independence within groups
    • No cluster effects
    • No infectious spread

Housing data dictionary, 1 of 5

source: 
  This file was found originally at a website 
  DASL (Data And Story Library) that is no 
  longer available. 

description:  
  The original source describes the data as
  "a random sample of records of resales of 
  homes from Feb 15 to Apr 30, 1993 from the
  files maintained by the Albuquerque Board 
  of Realtors. This type of data is 
  collected by multiple listing agencies in
  many cities and is used by realtors as an
  information base."

Housing data dictionary, 2 of 5

copyright:  
    Unknown. You should be able to use this data for
  individual educational purposes under the Fair Use
  guidelines of U.S. copyright law.

format: 
  delimiter: space
  varnames: first row of data
  missing-value-code: *
  rows: 117
  columns: 8

Housing data dictionary, 3 of 5

vars:
  Price:
    label: Selling price
    unit: dollars
    
  SquareFeet:
    label: Living space
    unit: square feet
    
  AgeYears:
    label: Age of home
    unit: years

Housing data dictionary, 4 of 5

  NumberFeatures:
    label: 
      Home features (dishwasher, refrigerator,
      microwave, disposer, washer, intercom, 
      skylight(s), compactor, dryer, handicap
      fit, cable TV access)
    scale: count
    range: 0 to 11  
    
  Northeast:
    label: Located in northeast sector of city?
    values:
      Yes: 1
      No: 0

Housing data dictionary, 5 of 5

  CustomBuild:
    label: Custom built?
    values:
      Yes: 1
      No: 0
    
  CornerLot:
    label: Corner location?
    values:
      Yes: 1
      No: 0

  Tax:
    label: Yearly property tax
    unit: dollars

Split file dialog box

Housing analysis

Price histogram, 1 of 2

Price histogram, 2 of 2

Stop grouping, using the Split file dialog box

Population pyramid

Agggregate data dialog box

Compute variable dialog box

Q-Q plot for all of the data

Boxplots

Should you consider a log transformation?

  • Yes because
    • Data bounded below by zero
    • Some evidence of non-normality
    • Some evidence of unequal variation.
      • Unbalanced sample sizes
  • No because
    • Some evidence of normality
      • Sample size is large
    • Differences in variation not too extreme

Compute Variable dialog box

boxplots of the log transformed variables

Independent Samples T-Test dialog box

Define Groups dialog box

T-test output, 1 of 5

T-test output, 2 of 5

T-test output, 3 of 5

T-test output, 4 of 5

T-test output, 5 of 5

Univariate model dialog box

Univariate model dialog box

General linear model output, 1 of 3

General linear model output, 2 of 3

General linear model output, 3 of 3

Compare means dialog box

Compare means output

Geometric means

Geometric standard deviations

General linear model for log transformed data

Back-calculated statistics

Back calculated confidence intervals, 1 of 2

Back calculated confidence intervals, 2 of 2

Conceptual formula for sample size justification, 1 of 2

Conceptual formula for sample size justification, 2 of 2

Moon data dictionary, 1 of 4

---
data_dictionary: moon.txt

source:
  This data file is part of OzDASL, an archive of various data sets 
  useful for teaching. The maintainers of the data archive are from
  Australia, but this particular data set is not specific to that part
  of the world. The entire archive is at
  https://dasl.datadescription.com/

Moon data dictionary, 2 of 4

description:
  This data set shows a perceptual experiment where subjects were asked
  to estimate a size ratio with their head level to the ground and then
  with their head elevated (in other words, looking upward). Although
  the objects being compared were the same size, almost all subjects 
  overestimated the relative sizes. The hypothesis to be tested is 
  whether the overestimation is greater with eyes level than with eyes
  elevated. A more detailed description is available at
  https://gksmyth.github.io/ozdasl/general/moon.html
  
download:
  https://gksmyth.github.io/ozdasl/general/moon.txt

Moon data dictionary, 3 of 4

copyright:  
  Unknown. You should be able to use this data for individual 
  educational purposes under the Fair Use guidelines of U.S. 
  copyright law.

format: 
  delimiter: tab
  varnames: first row of data
  missing-value-code: not needed
  rows: 10
  columns: 3

Moon data dictionary, 4 of 4

vars:
  Subject:
    label: Subject number
    format: numeric

  Elevated:
    label: Perceived ratio with eyes elevated
    format: numeric
    
  Level:
    label: Perceived ratio with eyes level
    format: numeric
---

Three things you need for a sample size justification

  • Hypothesis
  • Standard deviation
  • Minimum clinically important difference

Rule of 16

  • n per group = 16 / ES^2
  • Examples:
    • ES = 0.5, n per group = 64
    • ES = 0.1, n per group = 1,600

Sample size calculation dialog box

Sample size calculation output

Sample size calculation output for second scenario

8825625958fb234f4150e816ea63c472b993f185